Validation of two novel human activity recognition models for typically developing children and children with Cerebral Palsy

Human Activity Recognition models have potential to contribute to valuable and detailed knowledge of habitual physical activity for typically developing children and children with Cerebral Palsy. The main objective of the present study was to develop and validate two Human Activity Recognition models. One trained on data from typically developing children (n = 63), the second also including data from children with Cerebral Palsy (n = 16), engaging in standardised activities and free play. Our data was collected using accelerometers and ground truth was established with video annotations. Additionally, we aimed to investigate the influence of window settings on model performance. Utilizing the Extreme gradient boost (XGBoost) classifier, twelve sub-models were created, with 1-,3- and 5-seconds windows, with and without overlap. Both Human Activity Recognition models demonstrated excellent predictive capabilities (>92%) for standardised activities for both typically developing and Cerebral Palsy. From all window sizes, the 1-second window performed best for all test groups. Accuracy was slightly lower (>75%) for the Cerebral Palsy test group performing free play activities. The impact of window size and overlap varied depending on activity. In summary both Human Activity Recognition models effectively predict standardised activities, surpassing prior models for typically developing and children with Cerebral Palsy. Notably, the model trained on combined typically developing children and Cerebral Palsy data performed exemplary across all test groups. Researchers should select window settings aligned with their specific research objectives.


examples.
This statement is required for submission and will appear in the published article if the submission is accepted.Please make sure it is accurate.If the data are held or will be held in a public repository, include URLs, accession numbers or DOIs.If this information will only be available after acceptance, indicate this by ticking the box below.For example: All XXX files are available from the XXX database (accession number(s) XXX, XXX.).

•
If the data are all contained within the manuscript and/or Supporting Information files, enter the following: All relevant data are within the manuscript and its Supporting Information files.Traditionally, accelerometer data have been analysed using brand specific software with a count-based approach providing information about the intensity of the performed activities (7,8).Even though several population-and age-specific cut-off values for the different intensities have been developed, the validity is questionable both in typically developing (TP) children and in children with disabilities (9,10).Cut-off values for children with cerebral palsy (CP) has for example shown to misclassify 30-40% of the performed intensities (11).An alternative approach is to classify unique behavioural patterns by using Human Activity Recognition (HAR) methods, and thereby identify and recognise specific activity types, such as sitting, jumping, or standing (7,12,13).HAR models have shown promising results, with over 90% accuracy in free living conditions for various population groups, including TD preschool children (7,8).However, accuracy varies in studies involving children, ranging from 62-95%, probably depending on factors like window size, activity protocol and algorithm used (14)(15)(16).
Despite the advantages with using HAR models, challenges include the need for specific training data, optimal accelerometer placements, and window size considerations (12,17,18).
The HAR method identifies unique movement patterns, and HAR models trained on healthy adults may not accurately identify physical activity patterns in children, particularly children with disabilities which often represents with deviating movement patterns (19,20).To address this, new HAR models must be developed and trained using data collected from children, including children with disabilities.As CP is the most frequent disability among children, it is suitable to include this group for our endeavour.Cerebral Palsy (CP) is a collective term for various neurological impairments resulting from cerebral injuries before the age of two (21)(22)(23).The CP condition may include a wide range of physical impairments like reduced walking speed and stride length (24), impaired balance, (25) and secondary musculoskeletal impairments like contractures and skeletal deformities (26), all which may give deviating movement patterns and constraints with regard to daily activity.
The previous studies validating HAR models for TD children are hard to compare, due to use of different classifiers, window sizes, accelerometer placements and activity protocols (14,16).Moreover, accuracies often drop significantly when transitioning from laboratory-based to free-living conditions [32].Only three publications have validated HAR models in children with CP, and they present the same issues regarding the model specifications (20,27,28).All these three studies achieved good accuracy for sedentary behaviour (over 80%), but variable results for free-living activities, with some as low as 27% accuracy (28).The protocols used in these studies may not adequately represent free-living activities, lacking elements like ballgames, outdoor activities, and free play (20,27,28).Therefore, a new HAR model validated on free living data is needed for both TD and CP.
Various human activities unfold over different durations, which the HAR models need to detect.For instance, a single step takes about 500 milliseconds to complete (29).However, walking as an activity involves multiple steps, and thereby extending the time frame beyond a mere 1-second interval.Consequently, the choice of window size is a contentious issue, impacting accuracy for both cut-off and HAR methods, particulary in children, who change activity frequently (19,30).Some studies suggest increased accuracy with larger window sizes (e.g., 15 seconds), while others argue for shorter windows (1-2 seconds) (15,31).Addtionally the use of overlapping windows, which provides more data points, is in some cases favoured for better classification (32).
To meet the challenges described, the present study aimed to develop and validate two machine learning models for recognizing habitual activities in TD children and ambulatory children with CP.Specifically, the two models differentiate with one model trained with only TD data and the second model with both TD and CP data.Additionally, this study investigated the impact of window size and overlapping versus non-overlapping windows on prediction accuracy, and consider practical considerations related to time and storage use.

Participants
Data from 63 TD children and 16 children with CP are included in this validation study (Table 1).The TD children were recruited through a local primary and junior high school, and through colleagues and friends.

Validation protocols and test groups
The children conducted standardised semi-structured activities including different modes of running, walking, standing, sitting, and lying down, with varying durations.In addition, they performed a free-living protocol including ball games and free play, conducted both indoors and outdoors.For data-synchronisation purposes, the children also performed heel-drops, or the researcher flicked the accelerometer three times.When testing the HAR models, we divided the participants into three groups, based on activity protocol and if they were diagnosed with CP.The TD group included the typically developing children who completed both standardised activities and five minutes of free play.One CP group, here after called CP Stan, are children with CP who completed the same standardised activities as the TD group, except for the five minutes of free play.The second CP group, here after called CP Free, included children with CP who only engaged in group free play activities.For both CP and TD, all activities were performed in a single session, and all TD participants conducted the whole protocol.See Supporting information S1 for full list of activities and number of children conducting the different activities.

Activity monitors
All participants wore two Axivity AX3 accelerometers (Axivity Ltd, Newcastle, UK), one on the thigh placed along the anterior midline, in the middle between anterior superior iliac spine and proximal patella, and one at the approximate placement of L3.For the CP group the thigh accelerometers were placed on the least affected side.Acceleration was sampled at 100 Hz and 200 Hz for TD and 100 Hz for CP (range ± 8g).

Video Recordings
Video recording using GoPro Hero 3+ cameras were used to identify the performed activities.
The cameras were mounted in corners of the room during inside protocols, play and group activities.For activities with longer duration and/outside activities the GoPro camera was attached with a chest harness, pointing downwards to detect leg movement, or handheld by researcher.The recordings were sampled at 60 frames per second, resolution of 1080x720 pixels.

Video annotation
The video recordings were used as the ground truth for activity types.The activities in the videos were manually labelled (annotated) frame by frame for each participant using Anvil video annotation tool (version 6) (34).Thirteen activities were labelled using activity definitions used in previous validation studies with the NTNU-HAR models (7,12,35).
Definitions of activities are listed in Supporting information S2.

Data pre-processing and feature extraction
Before training our machine learning model, we performed three pre-processing steps (Fig 1).
Initially we down sampled and synchronized thigh and back accelerometer signals with activity annotations ensuring data alignment at the recommended 50 Hz (13).Subsequently, we segmented the signals into signal frames of our selected window sizes (1 sec, 3 sec, and 5 sec), with and without 50% overlap.Majority voting was applied to the annotations, such that each 1, 3, or 5 second signal frames corresponded to exactly one activity, based on the most frequently occurring in the set window.Lastly, we computed 161 time-and frequency-domain features for each signal frame, using the movement and gravitational components of all six sensor axes, and each sensor's vector magnitude, as described in Logacjov, Bach (12).These resulting features and annotations were used to train the machine learning models.

Machine learning approach
We used the Extreme gradient boost (XGBoost) classifier as our machine learning method.
The XGBoost is an ensemble learning approach based on the gradient boosting algorithm (36), where multiple weak classifiers (e.g., decision trees) are trained in a sequential manner.Each weak classifier was trained to minimize the errors made by the previous weak classifier.Our final model prediction is the weighted sum of all weak classifiers' predictions.

4. 4 Model training and cross validation (evaluation)
The model training resulted in two HAR models, NTNU-HAR-Children (HAR-Children) and NTNU-HAR-ChildrenCP (HAR-ChildrenCP).These models comprise a total of 12 submodels, covering 1-, 3-, and 5-seconds windows with and without overlapping windows.The HAR-Children were trained with data from the TD group and the HAR-ChildrenCP was trained with additional data from children with CP.For each model we initially performed a 6-fold cross-validation with hyperparameter optimization in the form of a grid search.This allowed us to find optimal hyperparameters for each of the 12 models, leading to a fairer comparison.
After finding optimal hyperparameters for each model, we performed 12 leave-one-subject-out cross-validations (LOSOCV), one for each model.In the LOSOCV, the model was trained on all participants, except for one, and this participant became the test-data.The overall performance of the model was estimated by repeating this process for each individual in the dataset and then averaging the performance across each individual (12,16).This gave us less subject-dependent estimates and thereby less subject-based bias (12,16,37).Note that the six HAR-Children models were trained without the CP data.Hence, the performed LOSOCVs only provided test results for the TD group in this model.To get the results for the CP groups from the HAR-Children models we trained the model on the whole TD dataset, and then used the CP data as test data.Additionally, before we compared the results of the different window sizes (1 sec, 3 sec, 5 sec), we unfolded the model predictions to the original 50 samples per second, to make all our models comparable.The complete dataset is available at (link to the dataset available upon paper acceptance) and is named NTNU-Children.

Data post processing
Initially we annotated with all activity definitions provided in supporting information (S2), we choose to do this to provide precision and to avoid confusing the model when training it on with similar movements.For the further processing, we collapsed some activity labels to make it applicable for practical use.We have defined shuffling as standing with small foot movements (see Supporting information S2), and for our current focus the differentiation between standing still and standing with some foot movement is of limited significance.
Therefore, shuffling was imbedded into standing.Similarly, bending is an activity that typically occur when standing and is therefore collapsed with standing.To avoid confusing our model's ability to recognize level walking, walking up and down stairs was collapsed with walking.
The original two categories of cycling, sit cycling and stand cycling were collapsed into cycling, as our primary interest was in recognize the cyclic leg movements.These collapsed activity classes are the same as used in previous NTNU-HAR models (7,12,35).The overall preliminary results are provided in Supporting information S3, in their originally annotated form.

Statistical Analysis
We assessed the HAR models by calculating the overall accuracy for each test group and determining precision, sensitivity, specificity, and F1 Score for each activity type.Sensitivity measured our model's ability to correctly classify activities when they occurred, while specificity evaluated the ability to avoid false recognition when activities were absent.
Precision indicated the ratio of correctly classified activities to the sum of correctly and falsely classified activities.The F1 Score, a harmonic mean between precision and sensitivity, provided a weighted precision and sensitivity measure.Accuracy was calculated as the ratio of correctly recognized activity samples to the total number of activity samples.These metrics range from 0-1, with higher values indicating superior performance.The confusion matrixes include the same collapsed activity classes.If the subject did not conduct the activity, they were taken out of the average calculations.We performed all these calculations for each subject before calculating group mean and confidence intervals.All calculations were executed in MATLAB.

Overall performance of the twelve HAR-models
For all models the overall accuracy was high, with the same median value (0.93) (Table 2), and the accuracy slightly favoured (range: 0.12-0.18) the TD and CP Stan groups compared to CP Free.The 1-second model in both HAR-Children and HAR-ChildrenCP performed exemplary for the respective test groups (Table 2).The largest difference in accuracy was between the corresponding HAR-Children and HAR-ChildrenCP models for the test group CP Free (range: 0.2-0.4,see Table 2).As illustrated in Table 2 the difference between overlap and non-overlap is only present in HAR-Children 1-and 3-seconds, and from this point on we present the nonoverlapping models.Non-overlapping models are preferred for their efficiency in processing time and storage, making them more practical for later use.Accuracy for the original activity classes is presented in Supporting information S3.All experiment results can be found at GITHUB (link available after paper acceptance).

Specific activity performance of the six selected HAR-models
In both 1-second models (Table 3a and 3b) all activities were predicted with high accuracy for the TD group, with F1 Score over 0.85.In CP Stan only running (range: 0.79-0.80)and cycling (range: 0.75-0.78)had a F1 Score under 0.90.The CP Free group had slightly lower accuracy and a wider range in both 1-second models (range: 0.29-0.94).This group had superior values in favour of the HAR-Children CP model, with an average difference of 0.09 (Table 3b), where walking, running, and jumping had the largest difference (0.09, 0.07, 0.27).
For the TD group the HAR-Children 5-seconds model (Table 3e) had lower F1 Scores in all activities (range of decrease in F1: 0.01-0.09)compared to 1-and 3-seconds models (Table 3a, b, c, d), except for cycling, where it was higher (< 0.06 increase in F1) with the 5-seconds models.In the two CP groups the HAR-ChildrenCP 5-seconds (Table 3f) had higher F1 Scores (range increase in F1: 0.01-0.04)than HAR-Children 5-seconds (Table 3e), except for the activity lying.When comparing the HAR-ChildrenCP 1-second (Table 3b) and the 5-seconds model (Table 3f) for the CP Stan group, the average difference in F1 Score was 0.03.Also, here the cycling score was higher with the 5-seconds model.In CP Free, the average difference in F1 Score was 0.05 in favour of the 1-second model.Mark that this was without jumping, while jumping shows the largest difference (Table 3e and 3f).

Confusion matrixes 309
In all the confusion matrixes in Figs 3, 4 and 5 the misclassification was higher in the test group

310
CP Free for all HAR-models, where walking was misclassified as standing on average 37% 311 and standing as walking in 10.8% of the instances.Running was more often misclassified as 312 walking in the two CP groups, with average 34.2% in the CP Free group and 17 % in CP Stan.
Jumping was often misclassified in CP Free, with 100% misclassification in both 3-and 5seconds models, however for CP Stan jumping was well predicted for all models and higher than the TD group.The difference between HAR-Children and HAR-ChildrenCP decreased with window size ( Figs 3,4,5).Furthermore, the performance for the HAR-ChildrenCP models decreased with increasing windows while the HAR-Children increase for the CP Free test group, however not for the other test groups (Figs 3, 4, 5).    3. The rows represent the video annotated or labelled activity types.The columns represent the predicted activity types.All numbers are percentages.

Discussion
This study validates two HAR models' ability to predict habitual physical activities in TD children and children with CP (GMFCS I & II).Both models accurately predict standardised activities, with the best overall accuracy for the CP Stan group.The two 1-second models outperform the other models for all test groups.The HAR-ChildrenCP 1-second had preferable performance for all test groups in all activities except for cycling, where 3-or 5-seconds perform better.In the CP Free group there was more variability between models and wider confidence intervals within activities.There were also larger differences in F1 Scores between HAR-Children and HAR-ChildrenCP models when tested on the CP Free group.The results of the present study show that window size is of importance, and ideal size depends on the target activity.The most challenging activities for the models to correctly predict are running, walking, and jumping for the CP groups.

HAR-ChildrenCP
The goal of activity recognition is to correctly predict daily life activities, particularly in freeliving situations.Our overall accuracy for TD children ranges from 92-95%, representing an improvement over prior models designed for the same age group and simulated free living, which report accuracy between 62-86% (14,15).Notably, studies with higher accuracy exists, however tested on a treadmill (16), limiting their applicability to real life situations.The results of the present study align favourably with previous studies including standardised protocols and children with CP, that have achieved accuracy exceeding 90% (20,27,28).Moreover, our study's achievement of accuracies >75% in simulated free-living conditions for children with CP is particularly notable, given the limited existing research with suboptimal accuracy (28).
Our model trained with free play activities also performs best for the CP Free group, highlighting the importance of specific training data (19,20).
The two models consistently demonstrate superior performance when applied to the TD and CP Stan groups, compared to the CP Free group.Intriguingly, in some models, the CP Stan group achieves higher overall and activity-specific performance than the TD group.This implies that the HAR models may not be primarily challenged by the deviating movement patterns in CP.Instead, the presence of unstandardised activities and sporadic transitions within free play emerges as a potential challenge for HAR models.In our data the inferior performance for the two groups that include free play, TD and CP Free, might also be due to the limited training data, with mere five minutes of free play in TD and six participants in CP Free.
Nevertheless, free play activities or variability in movement pattens might be beneficial for the model's ability to predict standardised activities, both for TD and children with CP.Which can be exemplified by the superior performance of the HAR-ChildrenCP models for both CP Stan and all activities, except for jumping, in the TD group.This is interesting because one would expect that the heterogeneity within the CP population hinders the model's performance.
However, in our experiment, this diversity within the CP population could potentially contribute to improved performance for standardised activities, thereby benefiting both TD and CP Stan.Overall, the HAR-ChildrenCP model includes all 79 children and thereby represent a wide variation of movement patterns, and with the contribution of the CP Free group it offers superior performance in most activities for all test groups.

Accuracy and misclassification of activities
In regards of misclassification of activities, the two groups with standardised activities had a lower misclassification of jumping, compared to free play.We can theorise that structured, sequential jumping, as seen in these protocols, are easier for the models to recognize compared to sporadic jumping or jumping imbedded in other activities in free play.Conversely, even though all jumping from the TD and CP Stan groups were included in the training, it might Our models confirm the significance of window settings in activity recognition, and its impact depend on the target activity.For instance, momentary activities such as jumping, are in our models best detected with the 1-second window, while cyclic activities with longer cycle lengths such as cycling is better detected with 5-seconds window.Interestingly, when the window size was increased to 5-seconds, the disparity in performance between HAR-Children and HAR-ChildrenCP models diminished in the CP Free group.This suggest that the intermittent changes in the CP Free group are concealed by majority voting in the 5 second window, and conceivably the intermittent activities in the training data play a reduced role for model performance.This data reduction effect of the larger windows have been beneficial in previous research and for specific activities (39), such as walking, cycling, and running, which have cycle rates exceeding one second.Existing literature involving children with CP typically use window sizes ranging from five to fifteen seconds (20,27,28,40), reporting higher F1 Scores with larger windows (20).Ferrari & Micucci (32) advocates for windows that are long enough to capture a complete cycle of a specific movement yet short enough to distinguish between similar movements.
Within the context of evaluating children's daily activities, our data suggest that the 1second models should be used when the objective is to detect momentary activities, such as jumping.However, it is essential to recognise that these momentary actions often transpire within a broader spectrum of gross motor activities, such as running, walking, and standing, which is often of primary interest.In this context larger window sizes offer a more efficient approach to data reduction and improves model performance in typical daily life scenarios.
Hence, in the context of everyday life and health outcomes the utilization of the 3-seconds model emerges as potentially preferrable.This choice is substantiated by our results, where the 3-second model predicts momentary activities, but also activities of extended duration, including cycling, with commendable accuracy.
Regarding the choice between overlapping or non-overlapping windows, our study reveals marginal difference in model performance with slightly better accuracy observed with overlapping windows.The optimal window size represent a trade-off between processing speed and prediction accuracy (18).Consequently, the benefits of overlapping windows must be weighed against the drawbacks, particularly in clinical application.The time-consuming nature of processing overlapping windows is a notable concern.In our model-training the 1-second with overlap required approximately ten times longer processing time than the 5-seconds without overlap.This substantial time difference is consistent with findings of Dehghani & Sarbishei (41), where segmentation with overlapping windows took twice as long, and training took four times as long compared to non-overlapping.Furthermore, in their study, the memory requirements for overlapping windows were nearly nine times greater (41).Given the trade-off in accuracy, questions arise regarding the practical utility of overlapping windows in clinical context, especially when dealing with large data sets.

Strengths and limitations
Comparing HAR models present considerable challenges due to variations in classifier usage, study populations, activity protocols, accelerometers, accelerometer placements, among other factors.Our study is unique for our utilization of the XGBoost classifier and two accelerometers.The use of two or more accelerometers and our placements have been emphasized as preferred settings (12,27).The relatively infrequent use of the XGBoost classifier in HAR research contrasts with the prevalence and recommendations in other fields, and have shown strengths, particularly due to its sequential learning (7,12).Moreover, our study incorporates specific configurations and specifications that have demonstrated excellence in other population groups (7,12,35).Additionally, our study is strengthened by the inclusion of playful behaviour and group activities, and thereby simulated habitual activities for children and children with CP.Furthermore, our data set is relatively large in comparison to other validation studies including children.
Some limitations warrant consideration.We have limited free play training data, potentially impacting prediction accuracy for the CP Free group.Additionally, the absence of a TD Free group prevents a direct examination of the contrasting effects of free play versus the presence of CP on model performance.

Future perspectives and implications
The present article underscores the specificity of the HAR method, emphasizing the importance

Conclusions
Both HAR models demonstrate precise predictions for standardised activities in both TD children and those with CP, and slightly less precise predictions for free play activities, but still precises and favourable compared to previous models.Among the three groups, the CP Stan has the most accurate predictions, prompting consideration of the influence of impairment versus free play activities.The NTNU HAR-ChildrenCP model with 1-second window exhibits the highest overall accuracy across all three test groups.However, larger window sizes might be recommended for population measurements.The choice of the optimal window size and overlap depends on the target activity.
with the following details: Initials of the authors who received each award • Grant numbers awarded to each author • The full name of each funder • URL of each funder website • Did the sponsors or funders play any role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript?• Did you receive funding for this work?Competing Interests Use the instructions below to enter a competing interest statement for this submission.On behalf of all authors, disclose any competing interests that could be perceived to bias this work-acknowledging all financial support and any other relevant financial or nonfinancial competing interests.This statement is required for submission and will appear in the published article if the submission is accepted.Please make sure it is accurate and that any funding sources listed in your Funding Information later in the submission form are also declared in your Financial Disclosure statement.View published research articles from PLOS ONE for specific examples.

Format
for specific study types Human Subject Research (involving human participants and/or tissue) Give the name of the institutional review board or ethics committee that approved the study • Include the approval number and/or a statement indicating approval of this research • Indicate the form of consent obtained (written/oral) or the reason that consent was not obtained (e.g. the data were analyzed anonymously) • Animal Research (involving vertebrate animals, embryos or tissues) Provide the name of the Institutional Animal Care and Use Committee (IACUC) or other relevant ethics board that reviewed the study protocol, and indicate whether they approved this research or granted a formal waiver of ethical approval • Include an approval number if one was obtained • If the study involved non-human primates, add additional details about animal welfare and steps taken to ameliorate suffering • If anesthesia, euthanasia, or any kind of animal sacrifice is part of the study, include briefly which substances and/or methods were applied • Field Research Include the following details if this study involves the collection of plant, animal, or other materials from a natural setting: Field permit number • Name of the institution or relevant body that granted permission • Data Availability Authors are required to make all data underlying the findings described fully available, without restriction, and from the time of publication.PLOS allows rare exceptions to address legal and ethical concerns.See the PLOS Data Policy and FAQ for detailed information.Yes -all data are fully available without restriction Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation A Data Availability Statement describing where the data can be found is required at submission.Your answers to this question constitute the Data Availability Statement and will be published in the article, if accepted.Important: Stating 'data available on request from the author' is not sufficient.If your data are only available upon request, select 'No' for the first question and explain your exceptional situation in the text box.Do the authors confirm that all data underlying the findings described in their manuscript are fully available without restriction?Describe where the data may be found in full sentences.If you are copying our sample text, replace any instances of XXX with the appropriate details.

Figure 1 .
Figure 1.The pre-processing and feature extraction, before model training.

Fig 2
Fig 2 shows the distribution of the performed activities in the three test groups.There was more

Figure 2 .
Figure 2. Bar plot of total amount of activity in the three test groups: TD=Typically developing

Figure 3 .
Figure 3. Confusion matrix for the 1 second models with non-overlapping windows.One

Figure 4 .
Figure 4. Confusion matrix for the 3 second models with non-overlapping windows.One

Figure 5 .
Figure 5. Confusion matrix for the 5 second models with non-overlapping windows.One of the training data as well as technical specifications for future research and clinical work.One noteworthy consideration is the necessity for a specific HAR model for CP, GMFCS I &II as the disparity between HAR-children and HAR-ChildrenCP appear relatively minor for standardised activities.Conversely, based on our results we can suggest the call for the HAR-ChildrenCP model, or children with disabilities model, is not restricted to children with CP, but also to TD children.Such a model, like our HAR-ChildrenCP, may offer enhanced versatility by greater variation within the training data, facilitating the recognition of a broader spectrum of movement patterns in the general child population.An intriguing avenue for future research involves training our model with activity data collected in children's daily routines, such as in their home, kindergarten, school, and leisure activities.This expansion into realworld scenarios could yield valuable insights and advance the applicability of our model and HAR methods in child populations.

Table 2 .
Overall accuracy for each model with and without overlaps.Test groups: TD= Typically developing children, CP Stan= Cerebral palsy with standardised activities, CP Free=Cerebral palsy with Free play.The range of accuracy is 0-1, where higher scores are better.

Table 3 .
Sensitivity, specificity, precision, and F1 Score calculated as mean across all participants in the test group and [95% confidence interval].Sub=Subjects, number of participants detected with the activity.Each sub-table represent one of the HAR-models for the three predefined test groups.TD= Typically developing, CP stan= Cerebral Palsy standardised activities, CP Free= Cerebral Palsy, Free play.